Goto

Collaborating Authors

 Gulf of Alaska


Scientists uncover identity of mysterious 'golden orb' discovered miles underwater in 2023

FOX News

The mysterious "golden orb" pulled from over two miles beneath the Gulf of Alaska in 2023 has been identified as a remnant of a rare giant deep-sea anemone, researchers say.


Distillation and Interpretability of Ensemble Forecasts of ENSO Phase using Entropic Learning

Groom, Michael, Bassetti, Davide, Horenko, Illia, O'Kane, Terence J.

arXiv.org Machine Learning

This paper introduces a distillation framework for an ensemble of entropy-optimal Sparse Probabilistic Approximation (eSPA) models, trained exclusively on satellite-era observational and reanalysis data to predict ENSO phase up to 24 months in advance. While eSPA ensembles yield state-of-the-art forecast skill, they are harder to interpret than individual eSPA models. We show how to compress the ensemble into a compact set of "distilled" models by aggregating the structure of only those ensemble members that make correct predictions. This process yields a single, diagnostically tractable model for each forecast lead time that preserves forecast performance while also enabling diagnostics that are impractical to implement on the full ensemble. An analysis of the regime persistence of the distilled model "superclusters", as well as cross-lead clustering consistency, shows that the discretised system accurately captures the spatiotemporal dynamics of ENSO. By considering the effective dimension of the feature importance vectors, the complexity of the input space required for correct ENSO phase prediction is shown to peak when forecasts must cross the boreal spring predictability barrier. Spatial importance maps derived from the feature importance vectors are introduced to identify where predictive information resides in each field and are shown to include known physical precursors at certain lead times. Case studies of key events are also presented, showing how fields reconstructed from distilled model centroids trace the evolution from extratropical and inter-basin precursors to the mature ENSO state. Overall, the distillation framework enables a rigorous investigation of long-range ENSO predictability that complements real-time data-driven operational forecasts.


Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting

Stock, Jason, Arcomano, Troy, Kotamarthi, Rao

arXiv.org Artificial Intelligence

Diffusion models offer a physically grounded framework for probabilistic weather forecasting, but their typical reliance on slow, iterative solvers during inference makes them impractical for subseasonal-to-seasonal (S2S) applications where long lead-times and domain-driven calibration are essential. To address this, we introduce Swift, a single-step consistency model that, for the first time, enables autoregressive finetuning of a probability flow model with a continuous ranked probability score (CRPS) objective. This eliminates the need for multi-model ensembling or parameter perturbations. Results show that Swift produces skillful 6-hourly forecasts that remain stable for up to 75 days, running $39\times$ faster than state-of-the-art diffusion baselines while achieving forecast skill competitive with the numerical-based, operational IFS ENS. This marks a step toward efficient and reliable ensemble forecasting from medium-range to seasonal-scales.


OKG-LLM: Aligning Ocean Knowledge Graph with Observation Data via LLMs for Global Sea Surface Temperature Prediction

Yang, Hanchen, Wang, Jiaqi, Cao, Jiannong, Li, Wengen, Zheng, Jialun, Li, Yangning, Miao, Chunyu, Guan, Jihong, Zhou, Shuigeng, Yu, Philip S.

arXiv.org Artificial Intelligence

Sea surface temperature (SST) prediction is a critical task in ocean science, supporting various applications, such as weather forecasting, fisheries management, and storm tracking. While existing data-driven methods have demonstrated significant success, they often neglect to leverage the rich domain knowledge accumulated over the past decades, limiting further advancements in prediction accuracy. The recent emergence of large language models (LLMs) has highlighted the potential of integrating domain knowledge for downstream tasks. However, the application of LLMs to SST prediction remains underexplored, primarily due to the challenge of integrating ocean domain knowledge and numerical data. To address this issue, we propose Ocean Knowledge Graph-enhanced LLM (OKG-LLM), a novel framework for global SST prediction. To the best of our knowledge, this work presents the first systematic effort to construct an Ocean Knowledge Graph (OKG) specifically designed to represent diverse ocean knowledge for SST prediction. We then develop a graph embedding network to learn the comprehensive semantic and structural knowledge within the OKG, capturing both the unique characteristics of individual sea regions and the complex correlations between them. Finally, we align and fuse the learned knowledge with fine-grained numerical SST data and leverage a pre-trained LLM to model SST patterns for accurate prediction. Extensive experiments on the real-world dataset demonstrate that OKG-LLM consistently outperforms state-of-the-art methods, showcasing its effectiveness, robustness, and potential to advance SST prediction. The codes are available in the online repository.


Advancing Marine Heatwave Forecasts: An Integrated Deep Learning Approach

Ning, Ding, Vetrova, Varvara, Koh, Yun Sing, Bryan, Karin R.

arXiv.org Artificial Intelligence

Marine heatwaves (MHWs), an extreme climate phenomenon, pose significant challenges to marine ecosystems and industries, with their frequency and intensity increasing due to climate change. This study introduces an integrated deep learning approach to forecast short-to-long-term MHWs on a global scale. The approach combines graph representation for modeling spatial properties in climate data, imbalanced regression to handle skewed data distributions, and temporal diffusion to enhance forecast accuracy across various lead times. To the best of our knowledge, this is the first study that synthesizes three spatiotemporal anomaly methodologies to predict MHWs. Additionally, we introduce a method for constructing graphs that avoids isolated nodes and provide a new publicly available sea surface temperature anomaly graph dataset. We examine the trade-offs in the selection of loss functions and evaluation metrics for MHWs. We analyze spatial patterns in global MHW predictability by focusing on historical hotspots, and our approach demonstrates better performance compared to traditional numerical models in regions such as the middle south Pacific, equatorial Atlantic near Africa, south Atlantic, and high-latitude Indian Ocean. We highlight the potential of temporal diffusion to replace the conventional sliding window approach for long-term forecasts, achieving improved prediction up to six months in advance. These insights not only establish benchmarks for machine learning applications in MHW forecasting but also enhance understanding of general climate forecasting methodologies.


Spectral Filters, Dark Signals, and Attention Sinks

Cancedda, Nicola

arXiv.org Artificial Intelligence

Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.


Kernel Smoothing, Mean Shift, and Their Learning Theory with Directional Data

Zhang, Yikun, Chen, Yen-Chi

arXiv.org Machine Learning

Directional data consist of observations distributed on a (hyper)sphere, and appear in many applied fields, such as astronomy, ecology, and environmental science. This paper studies both statistical and computational problems of kernel smoothing for directional data. We generalize the classical mean shift algorithm to directional data, which allows us to identify local modes of the directional kernel density estimator (KDE). The statistical convergence rates of the directional KDE and its derivatives are derived, and the problem of mode estimation is examined. We also prove the ascending property of our directional mean shift algorithm and investigate a general problem of gradient ascent on the unit hypersphere. To demonstrate the applicability of our proposed algorithm, we evaluate it as a mode clustering method on both simulated and real-world datasets.


Hierarchical regularization networks for sparsification based learning on noisy datasets

Shekhar, Prashant, Patra, Abani

arXiv.org Machine Learning

We propose a hierarchical learning strategy aimed at generating sparse representations and associated models for large noisy datasets. The hierarchy follows from approximation spaces identified at successively finer scales. For promoting model generalization at each scale, we also introduce a novel, projection based penalty operator across multiple dimension, using permutation operators for incorporating proximity and ordering information. The paper presents a detailed analysis of approximation properties in the reconstruction Reproducing Kernel Hilbert Spaces (RKHS) with emphasis on optimality and consistency of predictions and behavior of error functionals associated with the produced sparse representations. Results show the performance of the approach as a data reduction and modeling strategy on both synthetic (univariate and multivariate) and real datasets (time series). The sparse model for the test datasets, generated by the presented approach, is also shown to efficiently reconstruct the underlying process and preserve generalizability.